% please textwrap! It helps svn not have conflicts across a multitude of
% lines.
%
% vim:set textwidth=78:

\section{Introduction}
In this report, we examine techniques to build models from data and apply them
to make optimal decisions. Building accurate models is important because they
improve the number of correct decisions made that may lead to increased
profits.
We illustrate our techniques by
building models from the KDD Cup 1998 learning data \cite{kdd_cup_data} and
predict whether the non-profit national veterans organization should make
97NK direct mailing solicitation to prospective donors in order to maximize
profits.

For this assignment we combine the training models of linear regression, 
built for predicting the field TARGET\_D of the dataset and logistic regression, 
built for predicting the field TARGET\_B. These two models predict the 
respective fields for the test set and their the product is used to identify 
if the person will respond to the solicitation and thereby donate a non-zero amount.

The remainder of this report is organized as follows. Section \ref{sec:opt}
explains the problem and optimal decision criteria. Section \ref{sec:flow}
explains the model we found to preprocess the data and 
select features, creating the final training model.
Section \ref{sec:results} presents the results of the predictions obtained
from the unused test set after applying our final training model. 
Section \ref{sec:conclusion} concludes. 

\ignore{
In this report, we examine techniques to identify rare examples in large data
sets. As Elkan explains in \cite{class_notes}, the interesting examples tend to
be the rare ones like detecting credit card fraud. Consequently, we focus on
identifying whether a direct mailing will result in a donation using the KDD
Cup 1998 learning data \cite{kdd_cup_data} where only about $5\%$ of the
examples positive. Identifying these examples are interesting because they
might bring insight in how to improve returns for the future. Similarly,
building models to identify these rare events helps produce better estimates
to guide decisions for the future.

To illustrate these techniques, we will build models using linear support vector
machine (SVM) and Naive Bayes learner to accurately predict whether direct
mailing results in a response.

}
